I am going to make some plotly plots based on Instacart.

Load packages and data

library(tidyverse)
library(p8105.datasets)
library(plotly)
library(RColorBrewer)

Let’s get some data and preprocess it.

data("instacart")

set.seed(12345)

instacart_sub = instacart %>% 
  sample_frac(0.25)

There are 1384617 observations in the instacart dataset. To get a smaller dataset, I would like to take a 25% random sample (instacart_sub) from it.

Plotly bar plot

instacart_sub %>% 
  group_by(department, aisle) %>% 
  summarise(
    order_count = n()
  ) %>% 
  arrange(order_count) %>% 
  ungroup() %>% 
  head(20) %>% 
  mutate(
    aisle = str_to_title(aisle),
    department = str_to_title(department),
    aisle = fct_reorder(aisle, order_count)
  ) %>%
  plot_ly(
    x = ~ order_count,
    y = ~ aisle,
    color = ~ department,
    type = "bar",
    colors = "Accent"
  ) %>% 
  layout(
    title = "Top 20 Least Popular Aisles",
    xaxis = list(title = "Number of Ordered Times"),
    yaxis = list(title = "Name of Aisle"),
    legend = list(title = list(text = "Department")),
    autosize = FALSE
  )

This bar plot shows the top 20 least popular aisles from all 21departments. For instance, products from the Beauty aisle are only ordered 63 times in this 25% random sample of instacart dataset.

Plotly box plot

instacart_sub %>% 
  mutate(
    order_dow = order_dow + 1,
    order_dow = lubridate::wday(order_dow, label = TRUE, abbr = FALSE)
  ) %>% 
  plot_ly(
    x = ~ order_dow,
    y = ~ order_hour_of_day,
    type = "box"
  ) %>% 
  layout(
    title = "Distribution of Ordered Time on Each Day of Week",
    xaxis = list(title = "Day of Week"),
    yaxis = list(title = "Hour of Day")
  )

This box plot shows the distribution of hour of day that items were ordered on each day of week. We can tell that people tend to place order relatively earlier on Mondays, compared to the other days.